Polymorphisms within DIO2 and GADD45A genes increase the risk of liver disease progression in chronic hepatitis b carriers

The study enrolled 284 patients with chronic hepatitis B virus infection. Participants included people with mild fibrotic lesions (32.5%), moderate to severe fibrotic lesions (27.5%), cirrhotic lesions (22%), hepatocellular carcinoma (HCC) in 5%, and people with no fibrotic lesions in 13%. Eleven SNPs within DIO2, PPARG, ATF3, AKT, GADD45A, and TBX21 were genotyped by mass spectrometry. The rs225014 TT (DIO2) and rs10865710 CC (PPARG) genotypes were independently associated with susceptibility to advanced liver fibrosis. However, cirrhosis was more prevalent in individuals with the GADD45A rs532446 TT and ATF3 rs11119982 TT genotypes. In addition, the rs225014 CC variant of DIO2 was more frequently found in patients with a diagnosis of HCC. These findings suggest that the above SNPs may play a role in HBV-induced liver damage in a Caucasian population.

www.nature.com/scientificreports/ Moreover, systematic localization of common disease-associated variation has shown that nearly 60% of non-coding GWAS SNPs and other variants are located within DNase I hypersensitive sites (DHSs), which serve key roles in the regulation of gene transcription as markers of cis-regulatory elements (CREs) 15 . Because DHS profiles reflect the occupancy of DNA-binding proteins such as transcription factors (TFs), these loci may alter the transcription factor binding site (TFBS) or induce variation in gene expression 1 . In this study, we have focused on SNPs within TFs and TFBSs of genes associated with the HBV lifecycle, which have been previously associated with multifactorial diseases or traits.

Results
Study group characteristics. The study group consisted of 284 chronic hepatitis B (CHB) patients, including individuals with mild fibrosis (92), moderate to severe fibrosis (78), liver cirrhosis-LC (63), hepatocellular carcinoma-HCC (13), and no fibrosis (38) participants. Table 1 summarizes the distribution of variables evaluated at study inclusion. The overall incidence rate of liver cirrhosis and HCC among CHB patients was 22.18% (63/284) and 4.5% (13/284) respectively, with male patients outnumbering females in both subgroups. The mean age of patients with liver cirrhosis was 61 years, and they were significantly older than no fibrosis (p = 0.000146) as well as patients with fibrosis (p = 0.000032). No significant differences in clinical parameters were observed between patients with mild and moderate to severe liver fibrosis. Aspartate aminotransferase (AST) (p = 0.016084) and total cholesterol (TC) (p = 0.014987) levels were higher among individuals with liver cirrhosis versus the no fibrosis group. As well, the prevalence of portal hypertension (HT) and thrombocytopenia (platelet count below 150,000) were much higher in the cirrhotic group.
DNA samples from all subjects included in the study were successfully analyzed, and high-quality genotyping data was generated for all eleven SNPs. The distribution of genotypes did not follow the Hardy-Weinberg equilibrium (HWE) for the liver fibrosis, cirrhosis, and no fibrosis group except for rs225014, rs2016520, rs4794067 in cirrhotic patients, and rs2016520 in no fibrosis group that was consistent with HWE (p > 0. 5). Surprisingly in the HCC group, only rs12031994 (AKT3) displayed deviation from HWE. Evaluation of the Linkage Disequilibrium (LD) pattern with the use of the correlation coefficient r2 between pairs of analyzed SNPs showed that all of them were independent (r 2 < 0.5). The distribution of SNPs genotypes was compared between no fibrosis, fibrosis, cirrhosis, and HCC groups ( Table 2). Significant differences in genotype distribution were observed for rs225014 (DIO2) and rs4794067 (TBX21) between groups of patients affected by different HBV-related liver diseases (Table 2). Rs225014 TT genotype was more common in patients with no fibrosis (52%) in comparison to the cirrhosis group (44%), and its frequency dropped to 8% in patients with HCC. On the other hand, the Gene polymorphisms and liver aminotransferase levels. In a univariate correlation analysis, ALT levels correlated with sex (p = 0.015), thrombocytopenia (p = 0.008), and HBV DNA levels (p = 0.03). Among the SNPs, we have observed significant associations between ALT concentration and the DIO2, GADD45A, and AKT3 genotypes ( Table 3). The presence of the minor allele at rs204014 and rs12031994, and a major allele at rs205017 and rs532446 were more common in patients with elevated ALT levels. Next, a multivariate regression analysis was used to identify independent predictors of ALT levels in our patients with HBV infection. Serum ALT activity was considered the dependent variable. The results of this analysis showed that thrombocytopenia, rs225014 TT, rs12031994 TT, and rs532446 CC were independently associated with ALT levels ( Table 4).
Genetic polymorphisms and the liver fibrosis progression in chronic hepatitis B. We next assessed the association between analyzed SNPs and liver fibrosis progression. Genotype distribution of the T allele within rs225014 was significantly different in the fibrosis score F0 group when compared to F1 (p = 0.003), F2 (p = 0.012), F3 (p = 0.0002), and F4 (p = 0.0003) patients. Significant differences were also found in genotype occurrence within F0 and F score groups for PPARG rs10865710 (p = 0.028), and TBX21 rs4794067 (p = 0.028, Fig. 1). Also, the GADD45A rs532446 TT genotype was more common in the F0 score in comparison to the F4 group.
HCC was detected in 13 of 284 (4.6%) CHB patients. No association was found between analyzed SNPs and HCC presence. However, different genotypic distribution was found for DIO2 rs225014 between patients with cirrhosis who have developed primary malignancy of the liver and those without HCC (p = 0.010426). Rs225014 CC variant was identified in 38% of patients with HCC, and 12% of cirrhotic patients without HCC.
In silico trial results. Using SIFT algorithm substitution at position 92 from T to A was predicted to be tolerated with a score of 0.51. Median sequence conservation was 3.50. SHOPE report showed that the mutant residue is smaller and more hydrophobic than the wild-type residue, and this variant's MetaRNN score was 2.324709e-05. Furthermore, rs225014 was analyzed by I-Mutant 3.0 and MUpro servers. The free energy change (∆∆G) values were below − 0.5 kcal/mol (∆∆G = − 1.30 for I-Mutatnt 3.0; ∆∆G = − 1.4718185 for MUpro), which indicates that the mutation can largely destabilize the DIO2 protein.
For two SNPs analyzed with RegulomeDB, the predicted rank was 5, which suggested that these SNPs have a minimal probability to affect TF binding and/or DNase peak (Table 7).  www.nature.com/scientificreports/  www.nature.com/scientificreports/ Remarkably, the highest evidence of regulatory function was shown for rs225014, rs10865710, rs532446, and rs4794067. RegulomeDB revealed that rs10865710 is linked to PPARG and TIMP4 expression, and may likely affect JUN protein binding, as well as falls within NFATC1, NFATC3, NFATC4, and NFAT5 binding motifs. With the same RegulomeDB rank, rs532446 was shown to affect numerous different proteins (Supplementary  Table S2) and is localized within ATF4 and PRDM binding motifs. Similarly, rs225014 was demonstrated to affect target gene expression and a variety of protein binding (Supplementary Table S3). Additionally, rs4794067 was shown to have an impact on multiple genes expression (Supplementary Table S4), and influence on EZH2 and CTCF binding.
The histone modification analysis showed that rs10865710, rs532446, and rs4794067 were predicted to locate in enhancer histone marks (liver, endocrine gland, exocrine gland). The key information regarding histone modification analysis restricted to the liver organ is shown in Table 8. More detailed information can be found in Table S5. Furthermore, miRNASNP analysis demonstrated that all SNPs may influence the recognition and targeting of miRNA (Table 9).

Discussion
It is well-established that multiple risk factors contribute to cirrhosis and HCC development in CHB patients 16 . Apart from the well-known risk factors such as older age, male gender, chronic active hepatitis, higher ALT levels, or history of decompensation, accumulation of genetic alteration during progression from health, through fibrosis to HCC are now considered of great importance 17 . In this study, we have focused on genetic polymorphism within transcription factor binding sites which are recently suggested as important players in downstream gene expression and phenotypic variations predisposing to different disease development 18 . We have performed an extensive literature review for candidate SNPs located at TFBSs identified by GWAS contributing to complex disease risk. Afterward, we limited the number of loci to those which had a potential impact on TF regulation associated with hepatitis B and/or liver disease progression. As a result, our study demonstrated that rs225014 (DIO2), rs532446 (GADD45A), rs12031994 (AKT3), rs11119982 (ATF3), rs10865710 (PPARG ) might contribute to the increased risk of liver disease progression in chronic hepatitis b carriers. Other parameters including metabolic markers, such as body mass index (BMI), diabetes, and triglyceride levels were not significant in our study. Additionally, no literature data regarding the possible role of investigated SNPs on these variabilities were found.
The strongest prognostic value was found for rs225014 (DIO2) and rs532446 (GADD45A), which were correlated with liver tissue scaring, as well as with elevated ALT. The CC genotype of DIO2 rs225014 or C allele occurred more frequently in patients with higher ALT levels, and with more advanced liver fibrosis. Consequently, the C allele had a risk effect for liver disease progression as it was more common in cirrhotic (56%) and HCC (92%) patients. In the same manner, the C allele at rs532446 of GADD45A was more common in CHB carriers with both raised ALT concentrations and liver cirrhosis. On the other hand, the TT genotype at both rs225014 (DIO2) and rs532446 (GADD45A) had a protective effect on liver scarring progression. Additionally, the genotype distribution differed significantly for rs225014 (DIO2) between groups of patients affected by different stages of HBV-related liver diseases, and between the cirrhosis and fibrosis group for rs532446 (GADD45A). Furthermore, functional mechanisms analysis of these SNPs using computational approaches demonstrated their influence on miRNA binding, target gene expression levels, and different protein binding. To the best of our knowledge, this is the first report presenting an association between polymorphisms of the above genes and the severity of liver disease in CHB patients.
Rs225014 (DIO2), also known as Thr92Ala, is involved in thyroid hormone (TH) metabolism and its regulation 19 . This polymorphism was demonstrated to have an impact on TH levels and therefore may influence on a variety of clinical aspects as well as the quality of life or cognition. DIO2 SNP rs205014 has been so far associated with symptomatic osteoarthritis 20,21 , type 2 diabetes mellitus 22 , atherosclerosis 23 , and bone mineral density 24 demonstrating the C allele as a risk factor. On the other hand, inversely to our results, the C allele at rs225014 was protective in response to lung injury 25 . Although DIO2 is not typically expressed in the liver, it has been shown that the lack of the neonatal DIO2 in mice hepatocytes leads to hepatic epigenetic reprogramming that can alter different liver functions modifying susceptibility to alcohol or diet-induced hepatic steatosis, hypertriglyceridemia, and obesity 26,27 . This may be explained by the fact that the liver is susceptible to the dynamic of THs, which participate in hepatic homeostasis. As the liver is one of the main target tissues of TH, any disruption of TH signals is closely associated with multiple liver-related diseases [28][29][30] . Moreover, the rs225014 DIO2-C allele creates unique TFBS for the NK3 homeobox 2 (NKX3-2) TF which are eliminated by the T-allele 20,31 . Because Table 7. RegulomeDB results for SNPs within selected regions. 1f-eQTL + TF binding/DNase peak; 5-TF binding/DNase peak; The RegulomeDB probability score ranges from 0 to 1 and the higher it is the more likely to be a regulatory variant. www.nature.com/scientificreports/ homeobox genes are known players in the regulation of HCC tumorigenesis, the elimination of the NKX3-2 binding site by the T-allele the elimination of the NKX3-2 binding site by the T-allele may in part be associated with the protection against liver disease progression. Furthermore, NKX3-2 (also known as BAPX1) has already been demonstrated as a poor prognostic factor for gastric cancer in vivo 32 . Furthermore, the BAPX1 gene was also reported to be up-regulated in breast and prostate cancers at the mRNA level 33 . GADD45A, TP53-regulated and DNA-damage responsive protein, plays a leading role in human tumorigenesis. Although the exact mechanism remains uncertain, the expression patterns of GADD45A vary in different
In the current study, we also observed an association between enhancer polymorphism rs10865710 in the PPARG gene and liver fibrosis progression. Although a C → G substitution at this site does not cause an amino acid change, rs10865710 was proposed as a risk factor for a variety of diseases 51 , such as asthma 52 , systemic sclerosis 53 , obesity 54 , as well as a non-alcoholic fatty liver disease 55 . Moreover, Lu et al. 51 have recently demonstrated that carriers with rs10865710 CG/GG genotypes express lower levels of PPARG in comparison to individuals with CC genotype, which may be associated with the downregulation of PPARG expression. Additionally, hepatic PPARG expression has been noted to promote liver steatosis 56 , and inhibition of PPARG has been shown to suppress steatosis-associated liver cancer in mice 57 . Associated with lower PPARG level rs10865710 minor allele was more common in patients with low fibrosis scores in our study. Of the six unique TFBS generated by the G allele, MEIS1 has been already shown to play a role in cardiovascular regeneration 58 . Because inhibitory effects of MEIS1 on tumorigenesis in renal clear cell carcinoma 59 , non-small-cell lung cancer cells 60 , or prostate cancer 61 have been reported, we suppose that the creation of the MEIS1 TFBS with the minor G allele may in part be responsible for the association of this SNP with liver fibrosis risk. In the same manner, associated with cirrhosis risk rs11119982 (ATF3) C allele creates one unique TFBS for the helicase-like transcription factor (HLTF) which is involved with altering chromatin structure. On the other hand, the minor T allele at this site is located in the binding site of five TFs that regulate transcription, and control hematopoietic progenitor cell control, cellular transcription, and repression.
This cross-sectional study has some limitations. Although we have analyzed ALT levels within groups with different liver damage scores, these measurements were performed at the time of liver assessment and we have no information regarding the further progression of the liver. Given that ALT is tend to fluctuate, people with early stages of liver cirrhosis can have normal liver function tests. Secondly, our study was performed on Caucasian subjects only. Therefore, similar studies on other geographic regions with different genetic populations should be done.
This study showed that rs225014 (DIO2), rs532446 (GADD45A), rs12031994 (AKT3), rs11119982 (ATF3), rs10865710 (PPARG ) are associated with the increased risk of liver disease progression in chronic hepatitis b carriers. The presence of the C allele at both DIO2 rs225014 and GADD45A rs532446 was independently associated with liver tissue scarring. Moreover, the occurrence of HCC in the study group was more common in individuals carrying the rs225014 CC genotype, and the rs532446 together with rs11119982 were associated with liver cirrhosis development.

Materials and methods
Patients. This study included 284 patients with confirmed CHB infection (HBsAg positive for more than 6 months) from the ANRS CO22 HEPATHER cohort (ClinicalTrials.gov registry number: NCT01953458). All the subjects were of Caucasian ethnicity and had no other concomitant liver etiologies (viral coinfection, autoimmune or metabolic). Patients were excluded if they were currently treated or had undergone antiviral treatment within 6 months before the initiation of the study. Serum samples were collected before liver fibrosis assessment, and underwent the standard procedure in the local clinical center laboratory, including hepatitis serologic variables (HBsAg, HBsAb, HBeAg, HBeAb, HBcAb, HBV DNA levels). Fibrosis scores were assessed by non-invasive transient elastography by using FibroScan (Echosens, Paris, France), and the METAVIR scoring system 62 was used for patient classification. Patients were subdivided as follows: no fibrosis (no scarring, stage 0), mild fibrosis (fibrosis stage I), liver fibrosis (fibrosis stages II-III), and cirrhosis (fibrosis stage IV; confirmed by two experienced pathologists). The procedures employed followed the ethical standards of the 1975 Declaration of Helsinki revised in 2013. The study protocol was approved by the Local Independent Bioethics Committee and the ANRS CO22 HEPATHER scientific committee. We have received the agreement to use the HEPATHER  Table S6. Out of the nine SNPs included in the study, only one was localized within the transcription factor (TBX21, rs4794067). The remaining SNPs identified at the transcription factor binding site (TFBS) include DIO2 (rs225017, rs225014), PPARG (rs10865710, rs2016520), ATF3 (rs11119982), AKT3 (rs12031994), and GADD45A (rs532446, rs37834688). 41 µL of ultrapure water was used to dilute the final extension product following the transfer into Chip Prep Module (Agena Bioscience, San Diego, CA, USA) for automated sample handling including desalting and dispensing samples onto the SpectroChip Array (Agena Bioscience, San Diego, CA, USA). Mass spectra were acquired with a MassARRAY ® Analyzer 4 mass spectrometer and analyzed with MassARRAY ® Typer 4.0 software. All procedures were performed according to the company's recommendations.
Statistical analysis. Statistical analyses were performed using STATISTICA software version 13.3 (Stat-Soft, Tulsa OK, USA). The Hardy-Weinberg equilibrium of analyzed SNPs was conducted by the MIDAS software. Chi-squared or Fisher's exact test was used to analyze the relationship between categorical vs. categorical variables. Logistic regression analysis was used to evaluate the contribution of genetic and nongenetic factors under the dominant, recessive, and additive models. A backward stepwise regression approach was applied when building multivariate models. All of the p-values presented were two-sided and only p < 0.05 was considered significant.
Bioinformatics analysis of statistically significant SNPs. Four software were used to analyze the effect of rs225014 on DIO2 protein. SIFT web server (https:// sift. bii.a-star. edu. sg/ www/ SIFT_ seq_ submi t2. html) was used to predict SNP impact on protein function based on sequence homology and the physical properties of amino acids. A score below or equal to 0.05 in a range between 0 and 1 conferred the deleterious effect of SNP on protein function. MUpro (http:// mupro. prote omics. ics. uci. edu/) and I-mutant 3.0 (http:// gpcr2. bioco mp. unibo. it/ cgi/ predi ctors/I-Mutan t3.0/ I-Mutan t3.0. cgi) web tools were used to determine whether the Thr92Ala amino acid substitution affects DIO2 protein's stability. Structural and functional effects of rs225014 were analyzed by the HOPE (Have (y) Our Protein Explained) (https:// www3. cmbi. umcn. nl/ hope/) server. MetaRNN pathogenicity prediction score was used (range 0-1), which when higher shows higher pathogenicity.
Investigation of any potential harmful effect of non-coding SNPs was performed at Regulome DB v2.1 (https:// beta. regul omedb. org/ regul ome-search//), which gives a ranking based on DNA binding, provides Chip data, chromatin states, and motifs. The RegulomeDB probability score is ranging from 0 to 1, with 1 being the most likely to be a regulatory variant. Furthermore, to predict the target gain/loss effect of SNPs in miRNA seed regions, miRNASNP was employed (miRNASNP-v3 (hust.edu.cn)).

Data availability
The datasets generated and analyzed during the current study are available in the BioStudies database, S-BSST1042.